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1. Introduction 

Assume that we observe a bivariate dataset {Xi, 5^1}"=! ^^^^ follows the regres- 
sion model, 

Y, = ^l{X,) + a{X,)e,, (1) 



where /i is the regression function and cr is a deterministic scale function. Also, 
Ei and Xi are the error and random design variables respectively (both being 
possibly long-range dependent) and Xi has cumulative distribution function 
F = Fx : K — > [0, 1] that is strictly increasing. 

We are interested in testing the presence of a change point in the slope of a 
regression function /i and if one exists, estimating its location. We describe this 
jump in the first derivative of as a kink and denote the change point by 9. 
Knowledge of this change point will allow us to identify change in trends in the 
underlying regression function of a non-parametric model. This could explain 
the change in qualitative or quantitative behaviour of an underlying process. 

1.1. Existing Results 

Before examining the kink estimation under the random design regression model 
(1), we first look at other non-parametric and parametric models and their link 
to the existing theory for kink point estimation. A change point estimation 
technique was pioneered by Goldenshluger, Tsybakov and Zeevi (2006) for es- 
timating change points in the regression function itself, not the kink scenario. 
The underlying model assumed for their framework was the indirect model with 
fixed design. The indirect model assumes that the regression function is not 
observed in practice but a so called 'blurred' version of the regression function 
is observed whereby the regression function has been transformed by a convo- 
lution operator. More specifically, the indirect model assumes that observations 
are realisations of the asymptotic model, 

dY{x) = K ii{x)dx + edB{x). (2) 

In the above model the function K ii{x) ~ Jjj A'(i — x)fi{x)dx represents the 
convolution of fj, and K and the noise is driven by a regular Brownian motion, 
B{x) and controlled by e x n~2 where the statement a„ x &„ means that 
the ratio an/bn is bounded above and below by positive constants. The fixed 
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design implies that the design variables Xi = ^ are equally spaced points on 
the unit interval. The asymptotic model, (2) is considered is due to a result 
by Brown and Low (1996) that shows (2) is asymptotically equivalent to the 
model, 

= K ^l{x^) + z^, (3) 

where Zi is an i.i.d. sequence of error variables. 

The specific estimation technique that Goldenshluger, Tsybakov and Zeevi 
(2006) formulated was the zero-crossing technique and it used a particular class 
kernel functions to identify the change point. Their technique will be adapted for 
use in this article and is pursued in further detail in Section 4.1. At this stage 
it will suffice to say that the main result of their paper established that the 
zero-crossing technique is optimal in the minimax sense under the framework 
given in (2). 

The zero-crossing technique has been applied by Cheng and Raimondo (2008) 
to estimate a kink instead of a jump point and was done in the direct model 
in the fixed design setting. In this framework the observations are assumed to 
follow a fixed design and realisations derived from the following asymptotic 
model, 

dY{x) = n{x)dx + edB{x). (4) 

Model (3) and their asymptotic equivalents are usually appropriate in practice 
when a variable is observed at regular intervals indexed by time and the errors 
are i.i.d. homoscedastic random variables. 

More recently, Wishart (2009) extended the technique further to include long- 
range dependent (LRD) noise observations instead of independent noise. The 
kink estimation technique was extended to include the model, 

dY{x) = fi{x)dx + e^dBnix), (5) 

where Bh{x) is a fractional Brownian motion with self-similarity index H € 
1). The noise process was normahsed by e" where a = 2 — 2H. Wang (1996) 
has shown that Model (5) is the asymptotic equivalent to the discrete model, 

y.i ^ fi{xi) + e^, (6) 

where is a LRD sequence of random variables. 

In this paper we are interested in model (1), which extends the fixed de- 
sign cases given in models (3), (6) above. They are extended in the sense that 
the design points are no longer restricted to a uniform grid of points and the 
scale function a{-) allows heteroscedasticity for the error terms in the regression 
model. The analysis of this random design model needs to be considered quite 
carefully, since the asymptotic behaviour of the estimators will depend on the be- 
haviour of the scale function and on the level of dependence present in the design 
variables and errors themselves. It has been shown by Reifi (2008) that there ex- 
ists an asymptotic equivalence between model (1) and (4) when cr(-) = constant, 
and the design variables are independent uniform random variables. However, 
this is not the case in general. As noted in Kulik and Raimondo (2009a), with 
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LRD design variables, model (1) cannot be equivalent to any asymptotic model, 
which is in contrast to model (5) being the asymptotic equivalent to model (6) 
in the fixed design case. 

There is an extensive treatment in the literature on both parametric and non- 
parametric methods for regression models with a random design framework that 
assume i.i.d. design and error variables. The methodologies used include, but 
are not limited to, kernel smoothing, wavelet decompositions and orthogonal 
series. The methods of change point estimation for the random design case 
have been considered in Gijbels, Hall and Kneip (1999); Huh and Park (2004); 
Korostelcv and Tsybakov (1993) 

There is also literature on the fixed design scenario in the presence of long- 
range dependent errors and the introduction of dependence in the errors always 
has a detrimental effect on estimation in this scenario. In the context of func- 
tion estimation some recent treatments of this topic include Cavalier (2004); 
Csorgo and Mielniczuk (1995); Johnstone (1999); Johnstone and Silverman (1997); 
Kulik and Raimondo (2009a); Wang (1996). For change point estimation work 
has been done by Wang (1999); Wishart (2009). 

Then there is a new emerging literature that attempts to combine the two sce- 
narios with random design regression models where the design variables and/or 
the error variables are LRD. When the framework includes a random design 
and possibly LRD variables then there is a more subtle asymptotic theory that 
is based on a delicate balance between the behaviour of the a function and the 
level of dependence present. This is evident in a current number of papers in 
the area and will be the case here as well. The interested reader is referred to 
work by Guo and Koul (2008); Robinson and Hidalgo (1997) for a parametric 
linear model approach in this context and to Csorgo and Mielniczuk (1999); 
Kulik and Raimondo (2009b); Mielniczuk and Wu (2004); Yang (2001) for re- 
gression estimation in a non-parametric framework. Finally some studies to 
estimate change points in the non-parametric context include Lin, Li and Chen 
(2008); Wang (2008). 

1.2. Article Outline 

Some preliminary framework is outlined in Section 2, setting up the class of 
functions that are considered and specific dependence assumptions made in the 
random design model. The main result of the paper is described in Section 3, 
along with a brief discussion. The estimation method is explained in detail in 
Section 4, with a brief outline of the zero-crossing technique in the fixed design 
and its extension to the random design case. All the necessary proofs of the 
results are given in Section 5. 
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2. Preliminaries 

2.1. Smoothness Assumptions and Kernels 

First we look at the smoothness of the regression function ^ and the properties 
of the kernel function that was constructed to use the zero-crossing technique 
by Cheng and Raimondo (2008). First we define a class of functions that have 
domain X ^ a kink at 9 and s > 3 derivatives that exist in the neighbourhood of 

e. 

Definition 1. We say that fi e ,^s{X,6) if, 

1. fi:X — >E. 

2. fi has a kink, that is, there exists a 9 Cz X and G M with ^ such 
that, 

[f,^'^m ^ t,^'\9+) - t,'^'H9^) = a„ 

where fj.^^\9^) and ^^^\9-) are the right and left first derivatives of n 
respectively. 

3. The higher order derivatives /i'*^ exist and are finite everywhere and sat- 
isfy, 

/i«(0+)=/i(*)(0-) fori^ 2,3,...,s-l. (7) 
4- For all x+ G (0, sup X - 9) and x- e (inf X - 9,0), 

1=1 

Condition 4- should be interpreted in the sense that /i^^^ has a separate 
Taylor expansion for points to the left and right of 9 respectively. Condition 

3. of Definition 1 might seem overly restrictive but is required to exploit the 
class of Kernel functions that are introduced later in this Section. We will also 
denote ,^s{0) = 0)- For completeness and comparison purposes we will 
also introduce another smoothness class to denote the class of functions that 
do not have a kink. This class is identical to J^s{d) except condition 2 and 3 are 
relaxed in Definition 1 in the sense that there does not exist a 6' e M such that, 
[/iW](0)^O. 

In the fixed design setting, wc can assume that the domain of the regression 
function is [0, 1] since any finite interval, [a, b] can be mapped to the [0, 1] interval 
by an affinc transformation. However this assumption is not always valid in the 
general random design case. In particular, if the design variables are LRD then 
it is required that they have a domain across the whole real line. 

To use the zero-crossing technique for this class of regression functions Cheng and Raimondo 
(2008) constructed a class of kernel functions via Legendre polynomials and we 
will denote this class of functions by Jfg ■ The full description of the zero-crossing 
technique and the consequent technical details required of the kernel functions 
are not covered here and the reader is referred to Goldenshluger, Tsybakov and Zeevi 
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(2006) and Cheng and Raimondo (2008) respectively for full treatment. How- 
ever, some key aspects will be given and for our case we will say K € J(^s, where 
s = 2fc + 1 and fc e Z+ if, 

2k+2 

K{x)^K{k,x)=ak ^ bj,kX^^-^''+Hi_,,^{x) , 
j=k~i 

where the polynomial coefficients are defined by 

(4fc + 5)! (_i)fc+3+i(2j)! 
"'^ ''~ 24fe+5(2fc)!(2fc + 2)! ' ^■^'^ j!(2fc- j + 2)!(2j-2fc + 2)!' 

This class of kernel functions is indexed by the level of smoothness s and is 
constructed to exploit the extra smoothness of the class ,^s{&)- To save on 
notation we denote Ki — K'^^\ to represent the i*'' order derivative of K. The 
kernels have the following properties: 

A',(-l) = A',(l) =0 for i = 1, 2, 3. and /\i(0) = 0. (9) 

j Ksiu) du = , j = 0, l,...,2fc. (10) 

Property (10) of ^ ensures that the smoothness of {6) can be exploited to 
obtain faster rates of convergence of the estimator 9 in estimating 6. For our 
purposes of estimation assume that € ^s{(^) and cr e where s A r > 3. 

2.2. Dependence Assumptions 

Throughout the paper there will be a dependence assumption either among the 
design random variables or in the error random variables. In particular, the 
assumed dependence structure is a causal LRD linear process that is defined 
below. 

Definition 2. Let Ci be a set of square summable constant coefficients that are 
defined, 

f 1, ifi = 0, 

\ z-(i+")/2L(i), lfl>l, 

where L : IR+ — > is a slowly varying function and < a < 1. Then, a 
random variable ^i, is said to be a causal LRD linear process if, 

oo 

l^i + ^Cjr],_j 

j=o 

where < oo and rji are i.i.d. random variables with density f^, and moments 
E,;* = and Er;2 = {YZ^ c]y' =: 
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Furthermore, a random variable is said to be a causal LRD Gaussian linear 
process if satisfies Definition 2 and {. . . , 77,} are i.i.d N (O, u"^^) . The case 
of a = 1 is to be interpreted as a short range dependent case and by the con- 
struction the random variable has E^i = /X{ and Var^^ = 1. Moreover, it can be 
shown that is a second-order stationary process and has asymptotic covariance 
structure Cov(Co,6) ~ C^k-"L^{k) where = {x"^ + x)-^^+"'>/^ dx. 

Therefore the process exhibits Long-Range Dependence and a consequence of 
this asymptotic covariance structure is that. 



Var ~ Cfn^-"L'in), Var ^ e 




C|n2-2«L4(„)^ ifO<a<i, 

C|n, if i < a < 1, 

(11) 

where := 2C^/((l-a)(2-a)), C| := 4C^/((l-2a)(2-2a)) and when 1/2 < 
a < 1, the covariances Cov (^q, are summable and C| = l-f-2 X^i^o ^'^^ i^o^^i) ■ 
Also, when a = 1/2, Var Cl) is asymptotically proportional to a term of 

order n times another term involving slowly- varying functions. Now throughout 
the paper, the design variables and error variables are assumed to follow one of 
the following dependence conditions: 

(A) The design variables, {A^i}"^]^ are i.i.d. random variables with domain X 

and common density / such that f{x) > for allx & X and sup^^^ \ f^^^^'Hx)\ < 
00. The error variables {£i}^^i are a causal LRD process with parameter 
ag. Furthermore, the random variables {si}"^]^ are assumed to be inde- 
pendent of {^i}r=i- Under (A), define the associated set of cr-fields, 

Gi ■= cr(. . . ,r]i_i,r]t;Xi,X2, ■ ■ ■ ,Xi). 

(B) The design variables, {Xi} are a causal LRD linear process with parameter 

where is a Lipschitz continuous function for j = 0, 1, . . . , s with 
fx{x) > for all x € M. The error variables arc centred and 

i.i.d., with a finite variance, independent of {Xi}^^^. Similarly, define the 
associated set of cr-fields, 

J^i = a{. . . ,77j_i,77i;ei,e2, ■ • ■ ,£1)- 

In both cases, the support of the design variables will be denoted X. Let 
F = Fx be the cumulative distribution function of X which is strictly increasing 
and denote by Fn{x) — '^^^X^ILi ^{Xi<x) the empirical distribution function 
of X. Also let Q — F^^ and Qn — F~^ be the quantile and empirical quantile 
functions respectively. We require that Q is Lipschitz, that is, there exists an 
Lq > such that 

\Qix) - Q{y)\ < Lq\x - y\. 

Finally, we need to impose some mild restrictions on a. We assume a is bounded 
away from and 00 in the sense that, 

< inf cr(t) < sup a{t) < 00 
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and that a ^'^r where r > 3. Throughout the article wc denote by C a general 
constant that is assumed to be positive and finite but which possibly changes 
from line to line. 

3. Main Result 

The main result of the paper is concerned with the construction and analysis 
of an estimator, 9, of the kink location 9. The analysis of the estimator is given 
in Theorem 1 and concerns the rate of convergence of 9 to the true the kink 
location 9. The estimator, 9, will be constructed in Section 4 along with the 
motivations and analysis. 

Theorem 1. Suppose a bivariate sequence of observations {Xi,Yi} that follow 
model (1) are observed such that € =^s(^) o.nd cr £ where s A r > 3. Then 
an estimator, 9 of the change point, 9, can be constructed such that, 



The proof of this Theorem is given at the end of Section 4. The minimax op- 
timality of this result is not pursued in this paper since the lower bounds on the 
convergence rate of 9 for the functional class #s (9) are not determined in the 
framework of random design. However, it is worth making the specific point that 
the obtained rate of convergence under Assumption (A) is the same as the min- 
imax rates for the fixed design case with i.i.d. errors (see Cheng and Raimondo 
(2008)). Consequently, it seems reasonable to conjecture that the rates of our 
estimator are optimal in the minimax sense. 

4. Kink estimation method 

In this section, the basis of the zero-crossing technique is studied and a brief 
overview given. Firstly, the zero-crossing technique pioneered by Goldenshluger, Tsybakov and Zeevi 
(2006) and applied by Cheng and Raimondo (2008); Wishart (2009) will be de- 
scribed briefly in Section 4.1 and then an adaptation for the random design case 
constructed in Sections 4.2 - 4.7. 

4.I. Approximation of the third derivative for the fixed design model 

In the fixed design setting (cf. model (6)) it can be assumed without loss of 
generality that the regression function fi has domain [0,1]. More specifically, 
assume that /i e ^s([0, 1], A) and estimate /^^'^-'(t) by. 



\9-9\ 




under Assumption (A), 



under Assumption (B). 
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where h = h{n) is the bandwidth that depends on n. Throughout the article it 
win be assumed that the bandwidth satsifies, at the very least, /i + ^ — > 0, as 
n — >■ oo. This is a standard regularity condition for kernel smoothing techniques 
and additional conditions on the bandwidth will be stated as needed. Using the 
functional class ^si&) and the properties of the kernel function it can be shown 
that for t G {h,l - h), 

Kn{t) = h-^K, + Oih^-') =: L,(i) + 0{h'-^), (12) 

where Lh{t) is the localisation term. Indeed, by exploiting the conditions of 
we can by express Kh{t) as follows. Change variable of integration to obtain, 



= h J K3 (x) fi{t + hx) dx. 

The last equality follows because the domain of K is [—1, 1] and the values of 
t are restricted to i S {h,l — h). This restriction is used to avoid possible edge 
bias effects from the two sided kernel function. Using integration by parts and 
exploiting the boundary condition (9), 

Kh{t) = -h-^J K2{x)fi^'^\t + hx)dx. (13) 

Let D = {t : |A - t| < /i} and r = (A - t)/h. Then |r| < 1 for all teD.Wc now 
split (13) into two integrals, 

Kh{t) = ~h-^J K2 (x) n'^^\t + hx) dx - h-"^ J K2{x)^i^^\t + hx)dx. 

To exploit ^s{[0, 1], A) define, 

Mt) := -h-^ (^J K2 (x) (m^'H^ + hx) - m^'HA")) dx 

+ J^K2{x) (//W(t + M-//W(A+)) dx^ ^0{h'-^). 

The order bound follows by using (7) and (8) in combination with (10). There- 
fore, this allows us to express Kh{t) in the following way, 

Kh{t) = -h-^J K2(x)fi^^\X-)dx-h-^ J K2{x)ti^^\X+)dx + Jh{t) 

= h-^Ki{T)[fi<'^^]{X) + A{t) = Lh{t) + Jh{t). 

This expansion ensures that Kh{-) = 0{h^'^) for s > 3, which is assumed to 
always hold since the third derivative of /i needs to exist and be finite if it is to 
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be adequately estimated and the method is to make any sense. More specificany 
we have the following, 

''^ ' \ Oih"-^), a fie'^^s or fie ^s{0) a.ndti D. ^ ' 

As seen in all three of the aforementioned papers that use the zero-crossing 
technique, the (5— separation rate Lemma given below is the technical result 
that explains why the above representation is effective. 

Lemma 1 ((5-separation rate). Let K G and ji £ #^([0, 1], 6'). In what 
follows the constant < Cq < 1 depends only the kernel Ki{-)- Let h > 0, S > 
be such that S < Cqh. Let As,h ~ {t : d < \t — 9\ < Cqh}. Then for K,h{t) = 
Khit,n): 

(a) \Kh{e)\ < Ch'-\ 

(h) for all t e Asm and S > Ch", \Kh{t)\ > C5h-^, 

(c) for all t e (0, 1) such that \e-t\> qh, \Kh{t)\ < Ch"-^. 

The proof of this Lemma is given in Cheng and Raimondo (2008). Their proof 
requires a minor correction as the extra regularity condition 3. is needed in the 
smoothness class ^s{S)- 

The main idea of Lemma 1 allows us to exploit the expansion given in (12) 
and focus in on the location of the kink. The kernel function has specific prop- 
erties to guarantee that a unique global maximum and minimum occurs within 
order h of the kink point. Furthermore, the estimator was constructed so that 
the rate of convergence of kink location estimation is minimax for model (4). 
We will seek to adapt these results to the random design setting. 



4-2. Adapted Random Design Estimator of the third derivative 

Now consider fi € {X, 0) in niodel (1). An estimator is constructed to exploit 
the smoothed third derivative of ^ and the argument built around Lemma 1 dis- 
cussed in Section 4.1. The most natural extension would be to use the estimator, 

where fx (t) is the estimate for the density of Xi at the point t given by, 

i=l ^ 

Unfortunately, from a brief computational investigation, the estimator given 
in (15) appears to suffer from poor numerical performance. Instead of using 
(15), another estimator is constructed by rescaling the design variables by the 
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distribution function F and Kh{t) is estimated in the random design setting by, 

This estimator was chosen since it also is a proxy for the fixed design estimator 
given in Section 4.1 and seems to exhibit better numerical performance than 
(15). The estimator given in (16) is also an unbiased estimate of the smoothed 
third derivative, 



E?S,,(t) =/i-^EAi(Xi)/v3 



h 



=h-^£^,{u)K,(^^^-J-^ dF[u) 

= h~'^ j fiF{x)K3 (^^—^^ dx = Kh{t,iJ,F), (17) 

where fipi-) = KQi'))- ^ M S ^s{d), then (ip € ^^([0,1], A) where 9 = Q(A). 
In (17), the observed quantity is the smoothed third derivative of /ii?, which, 
coupled with Lemma 1 and the argument shown in Section 4.1 is equivalent to 
estimating a kink location A for the function fip in the fixed design setting. 

Therefore with the above argument, an estimator of a kink location of the 
regression function /j, in the random design setting is constructed that is ap- 
proximately the same as the estimator for kink location A of in the fixed 
design setting. This is done by estimating the value of A by A using the estab- 
lished zero-crossing technique in the fixed design setting and then rescaling A 
back by the quantile function to obtain an estimate of 9. Thus to assess the 
performance of our estimator we need to check that the convergence of Kh {t) to 
Kh{t) is sufiiciently fast. To do this consider the two following processes. 



7,(t) = ^l{X^)K3 

at) = a{X,)K: 



(F{X,)-t 



With these definitions, the overall accuracy of the estimator can be decomposed 
into, 

Khit) = Hh{t) + bh{t) + Zh{t), (19) 

where bh{t) and Zh{t) represent the respective stochastic error and stochastic 
bias contributions to the estimator and are given by, 

n n 

6;,(t) =n-i/i-4^(7,;(t)-E7i(t)), Z^(i) =n-l/^-4^C.(^)£^■ 



The analysis of the above terms are given in the next subsection. 
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4^.3. Probabilistic Behaviour for the Adapted Estimator 

In this section the analysis of the stochastic bias and stochastic error terms are 
considered before proceeding to the next stage of the zero-crossing technique to 
ensure that the stochastic contributions do not overwhelm the signal generated 
by the K/t(i) term. The proofs of the claims in this section will be deferred to 
Section 5. 

The first term to be considered is the stochastic bias term which did not ap- 
pear in previous kink analyses pursued by Cheng and Raimondo (2008); Wishart 
(2009) since there is some stochastic contribution by adapting the fixed design 
estimator to the the random design framework. Therefore, this term needs to 
be appropriately dealt with and the next Lemma is a useful tool that considers 
this term. 

Lemma 2. Consider a function /i : X — s-R such that ^' exists and is bounded. 
Then define the function 



7*(i) = (M^«)-A^F(t))i^3( 



(F{X,)-t 



If the design variables follow Assumption (A) then, 



sup 

te(o,i) 



^(7;(i)-ETr(i)) 



Oa.s. [y-nh^ |log/i| 



If the design variables follow Assumption (B) then, 



sup 

te(o,i) 



^(7;(0-E[7*(t)|^.-i]) 



= OpUnh^\\ogh\ 



Note that the two claims in given in Lemma 2 follow from the uniform law of 
iterated logarithms for independent variables and an similar iterated logarithm 
result for martingale difference sequences. 

We now state some central and non-central limit theorems for the estima- 
tor, Kh{t). The convergence of the estimator Kh{t) under both Assumption (A) 
and (B) is contingent on the size of the bandwidth relative to the level of de- 
pendence a. The specific details of this relationship between h and n" will be 
shown in detail inside the Theorems. Roughly speaking, if the bandwidth is too 
'large' compared to a then the dependence of the random variables dominate 
and the estimator converges to a process that needs to be normed by a sequence 
that relies on a. Conversely, if the bandwidth is 'small' compared to a then 
the dependence of the random variables is negligible and a regular central limit 
theorem holds with a norming sequence that is not reliant on a. In the forth- 
coming Theorems the extra smoothness of the regression and variance functions 
are exploited to be able to obtain an estimator that is not as sensitive to the 
level of dependence. In practice, this extra level of smoothness will most likely 
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be unknown. Due to its common occurrence in the subsequent Theorems, de- 
fine the asymptotic variance term, v'^{t) := {cr'p{t) + j^i K^i^) dx. The 
following Theorem deals with the case of Assumption (A). 

Theorem 2. Let K e .^s/\r, M G ■^s, c €'^r with s A r > 3 and t £ {h,l — h). 
Also if the design variables and error random variables follow Assumption (A) 
and the bandwidth h = h{n) also satisfies, 

^2(.Ar) + l^l-a,^2(^) ^0 OS n ->CX), (Al) 

then the following convergence result holds, 

V^{Kh{t)^nh{t))^N{0,v''{t)). (20) 
Conversely, if the bandwidth h = h{n) satisfies, 

;j2(sAr) + l^l-a.^2(^^) _^ ^ ttS n -> OO, (A2) 



then, 



whe 



' ^(^P^ (Mt) ^h{t)) A M (0, c',vl{t)) 



(sAr) / ,N 

v,{t) = ^ ^/ x'''^K3{x)dx. 



(s Ar)! 

Theorem 3 and Theorem 4 deal with case under Assumption (B) and give the 
central limit theorems when there is a 'small' or 'large' bandwidth respectively. 
In the 'large' bandwidth scenario a stronger assumption is used whereby the 
design variables are a causal LRD Gaussian linear process. 

Theorem 3. Let K £ ^s/\r, M G ^s, o with s A r > 3 and t G {h,l — h). 

If the design variables and error random variables follow Assumption (B) and 
the bandwidth h — h{n) satisfies, 

/i^n^""-L2(n) ^ asn^oo, (Bl) 

then the estimator obeys the following law, 

V^{Kh{t)-i^h{t))^Af{0,v\t)). (21) 

Theorem 4. Let K G J^^Ar, M G a with s A r > 3 and t G {h,l — h). 

Assume the design variables and error random variables follow Assumption (B) 
and that the design variables are a causal LRD Gaussion linear process. If the 
bandwidth h — h(n) satisfies, 



L'^[n) — >■ cxj as n oo, (B2) 
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and the estimator fihit) has a Hermite rank of 1 then the the estimator obeys 
the following law, 



L{n) 



whe 



re 



s\ = 1 ~ f ^ tj) and $ are the standard normal density and cumulative 
distribution functions respectively. 

Remark 1. If the estimator Khit) has Hermite rank q for some q G {2,3,...} 
then the asymptotic distribution depends on the size of the bandwidth relative 
to qa. Firstly, if n^^"^"^^ L'^'^iji) — oo then it can be shown using a simi- 
lar argument used in the Proof of Theorem 4 with the result of Theorem 2 of 

Avram and Taqqu (1987) that the normed process rt''"^/^L^*(n) (nhit) — K/i(t)) - 
'Hq{t)M'q where. 



.^ar,(f>{<^ ^(t)) Jjg^^ \ sx J ^ \ sx / \ur, 



u 



Or. 



and Hq(x) is the Hermite polynomial of degree q and Jifq is the Hermite- 
Rosenblatt process, 



where B denotes a standard Brownian motion. In Avram and Taqqu (1987), 
they considered Appell polynomials for a generalised sequence of stationary LRD 
random variables. In our case the LRD variables are Gaussian and consequently 
the Appell polynomials reduce to the Hermite polynomials. On the other hand, 
if the bandwidth satisfies n^-'>°'-h'' L'^'i{n) then (21) holds. 

As will be seen in Section 4.5, some large deviations results are needed to be 
able to to distinguish between the signal generated by the Khit) term and the 
stochastic bias and noise contributions. Unfortunately, a slightly weaker large 
deviations result is proved under Assumption (A) in Theorem 5. In particular 
we assume that the scale function, (t(-) = cr, is constant however this restriction 
could possibly be relaxed by using a different method. The large deviations 
result for Assumption (B) in Theorem 6 does not carry this restriction and the 
scale function need not be constant. 

Theorem 5. Let K € J^^Ar md the design and error variables satisfy Assumption (A). 
Further assume that the bandwidth h = h{n) also satisfies, 

llog^l' _^ L^in)\logh\' 

— 1 4 > U as n ^ oo. (22) 
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Then define, 



1=1 

^Zso define, 



1 /I 



B„(a;) = v/21ogn+-=== - - loglogn + log (2v^) (23) 

V21ogn v21ogn V2 / 

and a partition of [0, 1] , 

T„ = {t, = 2/y, J = 1, . . . , m„ - 1} (24) 
where m„ = [-^] . Then, 



lim P( sup 5;;^(0 < S™„(x) ) =e 



/or all X e 



Theorem 6. Let K € JtsAr and the design and error variables satisfy Assumption 
and assume that the bandwidth h — h{n) also satisfies, 

^ + + /,^(-Hi, ^ asn^oo. (25) 

Then define, 

vit.jVnh ~[ \ J 
with Bn{x) and Tn defined by (23) and (24) respectively, then. 



lim P ( sup |S',f (0| < S,«„(a;) ) = e~ 



for all X £ 



4- 4- Localisation Step 

Recall from (12), that the probe function given by Kh{t) gives a signal from the 
localisation term, Lh (t) with some approximation error and the estimator adds 
a stochastic bias and error term, 

Kh{t) - Lh{t) + 0{h'-^) + Zh{t) + bh{t). (26) 

Clearly, > /i**"^, since s > 3. So to be able to discern the signal generated 
from Lh{t) = 0{h~^), it is required that Lh{t) dominates the stochastic terms, 
Zh{t) and bhit). 
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By construction of the Kernel function, (cf. Cheng and Raimondo (2008)), 
Ki{-) has a unique minimum and maximum in the interval [—1,1], so that 
Ki{-/h) has a unique minimum and maximum in the interval of a length 0{h). 
Consequently, Lh{-) has a unique extrema near t* = 9 + 0{h) and = 9 — 0{h). 
As in the fixed design scenario considered by Cheng and Raimondo (2008); 
Wishart (2009) define, 

:= argminL;j(<) , t* := argmaxL;j(i). 
tG(o,i) te(oa) 

However, in practice the location of and t* arc not known and estimated using 
Kh{t) with, 

= argminK/j(t) , t* = argmaxK;i(t). 
te(oa) «e(o,i) 

If HF e ^s([0,l],A) then, 

\Lh{t*) + Lh{U)\>Ch-^ (27) 

There are two respective bandwidth restrictions, ((Al), (A2); (Bl), (B2)) for 
the asymptotic behaviour of the estimator under each of the Assumption (A) 
and Assumption (B) respectively. Starting with (Al) and (Bl), to have a well 
defined signal, it is required that, h^^ > Cn^'^h^'s /i > Cn^^ . Furthermore, 
since it is assumed that s A r > 3, to ensure that (20) and (21) always hold it 
suffices to choose h such that h < Cn~~~^^°'^'^"'='>^'^~^ , for some (5 > or, 

Cn~^+^ <h< Cn-^-^ (28) 

for some 5 > 0. With this choice, the bandwidth restrictions given by (Al) and 
(Bl) will always hold. 

It is worth noting that under this choice, the order of the stochastic terms 
does not involve ax or a^, the level of dependence. Note that h is chosen in 
a very similar manner if Si and Xi, i > 1, were i.i.d. Consequently, there will 
be no influence of the (long range) dependence on the change point estimation. 
The influence of the long range dependence will only affect testing purposes of 
the threshold used to determine if a signal is genuine and this will be discussed 
in the next subsection. 



4-5. Kink Detection step 

For simplicity in notation, assume that [fipY^^i^) > 0, which means, < t* (a 
similar argument follows if [^f]^^-*(A) < ^ > t*.) To detect a kink, first 
standardise the statistic Kh{t) to have unit variance. This will allow us to ap- 
propriately notice if there is a change-point present when the observed extrema 
of Kh{t) exceed the threshold for the noise process. Define this standardised 
process as, 

(29) 
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Then by (26) and (29) the 7K(i) process has expansion, 

n2/i2 , ^ , ^ ,,,, , ^ 

Tk = -^Lh{t) + o{n^h^) + —r— {Zh{t) + bh{t)) . 30 
v[t) v(t) 

As seen earlier, the information regarding a kink is generated by the Lh(t) 
process. A thresholding regime will be considered to be able to distinguish be- 
tween the signal generated by Lh{t) against the noise signal generated by the 
Zh{t) and 6/1 (i) terms. This thresholding will be split into the two scenarios for 
Assumption (A) and (B). 

Begin by firstly giving a general decomposition of the estimator for both cases 

by using, 7*(i) = (/i(X,) - mf(0)^3 {^^^) = l.{t) + (™^) 
and using (18) and (19). So, 

\/n}i7 1 " 

v{t) v{t)Vnh ^ 



(31) 



First assume cr(A'i) = cr, constant, and focus on Assumption (A). By an 
application of Lemma 2 and (10), 

%{t) = ^n^{t) + Oa.s. {\\0gh\) + S^{t). 

vit) 

From Theorem 5, it is known that /S'^(i) will diverge to infinity no faster than 
\/2|log2/i|. Also, if /i e then from (14), Kh{t) = 0{h''^) and 



lim P sup Tfiit) > v/2|log2/i| = 0. (32) 
VteT„ / 

However, if /i e (6*), then (27) holds and by (30), rnaxtg(t^ j.) 7K(t) > 
Cn^0 > y/2 |log 2h\ and a kink is detected when. 



max|rK(t)| > v/2|log2/i|. (33) 

A very similar argument holds for Assumption (B). In this case assume that 
the scale function a € with r > 3 and proceed as before. In conjunction with 
(31) and (10) apply Lemma 2, 



Tn{t) = ^^hit) + (t) + O, (/iv^b^) 



vit) 

-Kh{t)+S^it)+Op{l). 



vit) 
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The bandwidth restriction (28) guarantees that (25) and consequently Theorem 6 
holds. Then for Assumption (B) the same argument applies that was used to 
show (32) for Assumption (A). 

This thresholding technique does raise some restrictions that could possibly 
be removed by another technique. Recall from (28), that h > Cn~^^^ for some 
(5 > is required to be able to distinguish the signal from the stochastic terms. 
Also, (22) and (25) are required to be able to apply Theorem 5 and Theorem 6 
respectively and obtain a large deviation result for the process. Therefore to 
ensure both conditions arc satisfied, it is sufficient to consider a^; > ^ or > |. 



4-6. Zero Crossing Technique 

The idea behind the zero-crossing technique is that within the interval Ah = 
ITh.,^*], K/i(t) « Kh{t)- Using Lemma 1 we can locate the zero-crossing-time of 
Kh {t) which occurs at t — A with an accuracy of order S, S < h. First minimise 
1^/1(^)1 within the interval Ah'. 

A argmin |K/i(i)| argmin |7i(i)|. 

By comparing (12) with the bounds in Lemma 1 we see that the minimum is 
well defined if, 

Sh-^>Ch'-^ and 6h-^ > Cn-^h-i . (34) 

We will obtain the best possible accuracy if we choose 5 as small as possible, 
as long as both inequalities of (34) still hold. The left hand expression of (34) 
implies that 5 ^ and substituting this into the right hand expression of (34) 
we derive the order of the smallest possible bandwidth 

We now apply Lemma 1 with 5* = h% to locate the change point Xvajxp with 
an accuracy of order, 

A - A (5* = /if X n^T^ . 



4-7. Modified Estimator of Kink 



Recall that 9 ~ Q(A). In practice the true distribution function F is unknown, 
so it is estimated in the usual manner by the empirical distribution function 
Fn{x) = n^^ 127=1 ^{Xi<x} and consequently can obtain an estimator of Q via 
the empirical quantile function Qn{')- Estimate 9 by, 9 = Q„(A). The rate of 
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convergence of this estimator is evaluated below, 

l^-^l = |Qn(A)-0(A)| 

< IQn(A)-Q(A)| + |Q(A)-Q(A)| 

< |Q„(A)-0(A)|+Lq|A-A| 

<\QnW~QW\+Op{n-^^). (35) 

The rate of convergence in (35) is therefore contingent on the maximum of the 
rate from the generalised quantile process for the design variables or the rate 
from the initial unsealed kink estimator. Under Assumption (A), the quantile 
process involves independent and identically distributed design variables and for 
ah t e (0, 1), 

\Qn{t)-Qit)\^Op{n-^) (36) 

(see Csorgo (1983) and references therein for a detailed treatment). For Assumption 
the rate is dependent on ax and for all t S (0, 1), 

\Qn{t) - Q{t)\ = Opin-'^Lin)) (37) 

(see Theorem 5.1 of Ho and Hsing (1996)). Therefore, using (36) and (37) in 
(35), 

Op{n~ ), under Assumption (A). 

Op (rt^^JTT V {n^^ L{n))) , under Assumption (B), 
which proves Theorem 1. 

Remark 2. The method can be extended to the multiple kink scenario by observ- 
ing multiple instances of (33). For each instance of (33) there is a correspond- 
ing interval Ah and the localisation and zero-crossing-time steps are executed 
on each of those intervals to produce an estimate for each kink location. The 
interested reader is referred to Cheng and Raimondo (2008); Wishart (2009) for 
a more detailed treatment of the method in the multiple kink scenario. 



5. Mathematical Appendix 

Before giving the proofs, some notation is described. Let X denote a random 
variable and denote the Lp-norm \\X\\^^ ~ '&\Xf and ||-|| = \\-\\2- For a func- 
tion f -.X — R denote the sup-norm |/|^ = sup^.g_^. |/(a;)|. Throughout this 
Section a Taylor expansion of composite functions will be used to exploit the 
vanishing moment condition of K^. For the Taylor expansion to be well de- 
fined, the derivatives of the composite functions need to exist. A generalised 
chain rule for composite functions exists (see the Faa di Bruno formula from 
Hernandez Encinas, Martin del Rey and Mufioz Masque (2005) and references 
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therein), and these are of the form, 

keK:„ 1 ^ n 1=1 ^ 

(38) 

where /C„ = {ki e {Z+ U 0} : fci + 2fc2 + . . . + nfc„ = n} and k = X]"=i ^i- Also, 
through tedious but elementary calculus it can be shown that, the n*^ derivative 
of Q = F^^ will exist, and the Taylor expansions oi ^.p and (7^? up to order n 
will exist if /^"•' exists. 

Proof of Lemma 2. Begin with the proof of the first claim under Assumption (A) . 
Since <,i{t) will be non-zero only iiF{Xi) g [t—h, t+h), there exists a G (—1, 1) 
that depends on Xi such that. 



hTiK:. 



F{X, 



h 



and depends on r^. The ^'i(t) terms are independent random variables, each 
of which have variance that is of order h. Therefore by the Law of Iterated 
Logarithm (see Bingham (1986)) we have the following result, 

\misu^^===Y U{t)~¥.v,{t)] =-liminf^ Y {v,{t)-¥.v,{t) 

n^oo V""i0gl0gn^V / n^co -^nft log lOg 71 ^ V 

Therefore we have, 



i=l i=l 

= Ca.s. ( V nh^ log log n 



Oa.s. ( y/nh^ \\ogh\ 



which proves the first claim of the Lemma. Now to concentrate on the claim for 
Assumption (B), a proof of a similar claim in Lemma 4 of Zhao and Wu (2006) 
is adapted to our framework. This technique bounds the martingale difference 
sequence 7j*(i) — E [7,*(t)| J^i-i] above and below by two discretised martingale 
difference sequences and uses an exponential martingale inequality to gain the 
required probabilistic bounds. To do this, again exploit the Taylor expansion 
of n in Definition 1 and use the fact that Support{K^) = [—1,1], which means 
that there exists a dependent on Xi with |ri| < 1 such that F{Xi) =t + Tih 
and, 

7*(t) - (^^^^^) {Mt + r.h) - Mt)) Mt^h,t+h) {F{X,)) 

= nhK3 { ^^^'^ ~ + h)l(t-h,t+h) {F{X,)) , (39) 
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where |^| < 1. Then spht the function in (39) into its positive and negative parts 

by defining := t + ^ \n\ h and n^'-p\^i) = (jifi'-p^^,) 

- where f+ = {f W 0), /- 

positive and negative parts of /. Then, 



(— / A 0) denote the respective 



/ F(X.)-t 



( F{X,)-t 



h 



_ (F{Xi)~t 



lit-h,t+h){{F{X,))) 
lit-Kt+h)mX,))) 

(40) 



By the hnearity of the conditional expectation operator and (40) we can decom- 
pose the martingale difference sequence into parts, 



7*(i)-IE [7* (01-^.-1] 

= ^++(t) - E [c++(t)| -F.-i] - (^+-(t) - E [^+-(t)| -F.-i]) 

- (^r+(i) - E [^,r+(t)| J-,_i]) + ^r-{t) - E [^r^Wl 



(41) 



To begin with we will concentrate on the first martingale difference term on the 
RHS of (41) and bound it above and below by a discrctised version that does 

not depend on t directly. For this discretization let N — \{nh~^^ ^] and tj = 
where < j < N. Then for any t G [0, 1] there exists a j such that t G [tj, i^+i) 
and the distance |tj+i — tj \ = 0{N~-^). Define the two new tweaked martingale 
difference sequences versions of (^^^ {t) , 



F{X,) - t, 



K 



It can be shown that, the martingale difference sequence <^++(t)-E [(r++(t) | 
can be bounded uniformly in t above and below by. 



E 



CN-' < ,++it) - E [^++(t)| -F,-i] < - E [^++1 + CN-\ 



We have the following result, 

x:(.++(i)-E[,++(t)i^._i]) 



sup 

t6(0,l) 



1=1 



< ^^max_^ (|5„(j)| + \S^U)\) + CnN-' 



max 

0<j<N- 



^ {\Sn{j)\ + \Sn{j)\)+0(^^nh^\\0gh\"^ 
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where for each fixed j, Sn_{j) and Sn{j) are raartingales with respect to the 
filtration Tn and are defined, 



E 



i=l 



These martingales will be bounded by an exponential martingale inequality. 
Consider firstly the martingale S'„(j), its martingale differences are bounded 



-++ 



-+ 



I A', 



property of Q and the bounded domain of K3, 

2 



Cbh. Also using the Lipschitz 



J'i-l 



< 



<2h''LQ\K3\ 



h 



fx{u\ Fi-i) 



C cv ^ • 



Then, a martingale inequality for bounded differences given by Theorem 1.5 A 
of dc la Peiia (1999) can be used to yield, 

P {SJ^J) >x)< exp {-^ sinh-i (^g) | , (42) 

where a = Cbh and y ~ Ccvnh^. Furthermore if ax/2y — o(l) then using a 
Taylor expansion of sinh^^. 



sinh 



2yJ 2y 



(43) 



Now consider the chance that maxi<j<„ Sn_{j) exceeds the threshold x = CT\/nh^ |lo 
for some Ct > which combined with a = Cbh and y = Ccv^h^ implies, 
ax/2y = O [y'\\ogh\ /nh^ = o(l) and by (42) and (43), 

P {SnU) > CrV^h^l^oghl^ < exp 
So, fix e > and use (44), 



(J2 

^ 'log /i| +0(1) 



4G 



(44) 



max 

0<j<N- 



N-l 



^Sr^ij) > CTV^h3\\ogh\j <Py\J {SnU) > CrVrih^l^oghl] 

N-l 

< E ^ (^(j) ^ CTVnh^\\ogh\) 



C^ 



<iVexp, 



|log/i| exp{o(l)} 



(45) 
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By choosing Ct large enough wiU ensure that C'n3ft,'^T/4Cc„-f ^ ^^ic similar 
conclusion can be reached that for any e > there exists a finite constant C 
such that, 



P ( - ^^max ^Snij) > Cy^nh^ \logh\ ) < e. (46) 



Therefore, (45) and (46) ensure that. 



max Sn{j) = Op{y/nh^ \^ogh\). 



Using a comparable argument, the same conclusion can be reached for the Sn{j) 



max 

o<j<A'-i 



\Sn{j)\^Op{^nh^\\ogh\). 



Also, a similar technique can be used to bound the other martingale difference 
terms given in (4f), details omitted. □ 

Proof of Theorem 2. To prove the Theorem we appeal to similar results that 
were shown by Kulik (2008); Wu and Mielniczuk (2002) by decomposing the 
stochastic terms into two parts, a martingale part and a LRD part. This is done 
by defining, 

_ iQjt) - ECi(O) + 7.(0 - E7i(t) 
(VarCi(t)+Var7i(0) 

and then decomposing the standardised estimator 'fih{t) into two terms, 
y/nK'{nh{t)-Kh{t)) = V^[Zh{t) + hh{t)) 

/ n n \ 

= ^[Y. c.(^)^' + E - 1^71 (^)) 



V/i-i (VarCi(t) +Var7i(0)^x.(i) + E 



. Si 

nh . -I 



(47) 

The Theorem will follow by showing that either the first or last term on the RHS 
of (47) dominates under the bandwidth conditions (Al) or (A2) respectively. 
More specifically, it will be shown that the dominating term will follow a CLT 
and the other term converges to zero in probability; then Slutsky's Theorem 
completes the proof. Firstly consider the case where (Al) holds, then apply the 
martingale CLT of Brown (1971) to show, 

n 

Y,Mt)^m,i)- (48) 



Note that {xi(t)jGi} form a martingale difference sequence. So it remains to 
check that the sum of the conditional variances converge in probability to the 
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unconditional sum and the Lindcberg condition holds. Before we prove the Lin- 
deberg condition note that for t <E (/i, 1 — /i), 

ECi (t) = j <J^{x)Kl '^^^''^ f <^lit + hu)Kl {u) du. (49) 

Exploiting (10) and the assumption that cr e ^^r, 
ECj(i) j ^F{t + hu)K3 {u) du 

^(sAr) + l 



cr^'^''^(t + T;iu)ii^^'-/V3 (u) du = /i(-'^'^)+iu.(t), (50) 



where t € (0, 1). Therefore, using (49) and (50), 

VarCi(t) nl{t + hu)K^ (u) du j / '^f + Thu)u''^'' Kz (u) du 

J -I ((sAr)!) 

Due to the fact that the bandwidth is assumed to follow h E (0, 1), there exists 
a ho such that for all < h < ho, 

ft, inf j^pR |(T^(a;) I ^ , , , , 

VarCi(t) > ^ ^ ^' J ^ Kl iu) du. (51) 

From (50), it follows, h~iECi{t) = o(l) and from (49), h-^ECf{t) cr|,(t) j\ A'| (ti) du 
Therefore, h~^Ya.TCi{t) = h"^ {^C,l{t) - (ECi(t))^) ^ al{t) /j^^ A'| (u) dw. Also, 
the same argument applies for the 7i(t) term to yield, 

h-^ (VarCi(t) + Var7i(t)) '-^ v''{t). (52) 
Now the Lindeberg condition is shown to hold. Let e > be arbitrary, 

n 

^IEx?(Ol{|H.(t)|>.} = nEx?(i)l{|xi(t)|>.} 



E 



(£i (Ci(t) - ECi(t)) + 71 (0 - IE71W)' 1a„ 



(53) 



VarCi(t) + Var7i(i) 

whereA„ = (Ci(t) - ECi(O) + 7iW " E71 (01 > e^" (VarCi(i) + Var7i(0)}. 
The size of this set can be maximised using (51), 



An C 



{2 kil (kloo + ImL) > eV"VarCi(t)} 



C <; 2 |if3L kil (kloo + ImL) > 6. L^^^^^^^i^ A-| {u) du ) . 



(54) 
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Using the fact that n/i — )■ oo and ft, ^ as n oo we see that A„ — )• 0, the 
empty set. Consequently with (53), (54) and ri&'x^{t) < oo imply that, 



{lx.(t)l>^} 



0, 



i=i 



and the Lindebcrg condition holds. By a consequence of (11), let e > be 
arbitrary. 



>e < 



1 



{Cln~^WCln-^"L^{n)) 



= o(l). 



Then by the above, the sum of the conditional variances to converge in proba- 
bility to one: 



ELi VarCi(t) + nVar7i(t) + 2Cov (Ci(t), 7i(t)) ELi ' 
n (VarCi(t) + Var7i(t)) 



and by the martingale CLT, (48) follows. 

Now we show that the last term on the RHS of (47) converges in probability 
to zero. Consider an arbitrary e > 0, then using (50) and (11), 



P 



ECi(i) 



'nh 



E' 



-0(1), 



(55) 



and the last line follows by the bandwidth restriction given in (Al). Thus, the 
proof of the first claim under the 'small' bandwidth scenario holds. 

Consider now the 'large' bandwidth scenario. Using (47), (48) and (50), 



Kh{t) - Khit) = Op (n U + 



Also, from Ho and Hsing (1997), it is known that 

n 



■E' 



n 2 L(n) ^—^ 
Therefore, normalising the expression on (56) 



(56) 



(57) 



{^r.{t) - n.H{^) = Op (/.-^-(^^■^)n-^L-i(n)) + ^^^^^ ^e., 



and the result follows from (A2) and (57) with Slutsky's Theorem 



□ 
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Proof of Theorem 3. First break down the estimator into its separate martingale 
and LRD part in a similar fashion to the method employed in the proof of 
Theorem 2. Using (31), apply Lemma 2, 

1 " 



|logfe| 



)1 " 



^3 



A3 



1 " / 

^ ^ (e [7.(i)l - E7.(t)) + Op 



\\0gh\ 



(58) 



Define the standardised stochastic terms, 



A.(t) 



v{t)\/ nh 



Then in a similar fashion to the Proof of Theorem 2 it will be shown by the 
martingale CLT of Brown (1971) that. 



^A,(t) AaA(0,1) 



(59) 



1=1 



Indeed, Ai{t) is a martingale difference sequence with respect to the cr- fields 
{Ti}- Thus we need to check that the Lindeberg condition holds and that the 
sum of the conditional variances converge in probability to 1. First, focus on 
the convergence of the conditional variances. The conditional variances can be 
broken into two parts. 



nhv'^(t) E^ 



E 



E 



nh v'^{t) 



/V3 r™^-^ 



(60) 



Dealing with the second term on the RHS of (60), use Lemma 1 of Zhao and Wu 
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(2008), 



- n - n n 

i=l ?'— 1 2—1 



cr|(i + hx)Kl{x) dx + ©^(^"^^(n)) 



c^f(^) / i^3(a;)rfa; + e'(/i^) + e'p(n-tL(n)) (61) 



h 



i-1 



To bound the first term of (60), a bound is required for E 
Define := - rn = fix + J2jLi ^jVi-j and := s3j.^(Xi^i_i - fix) and 

define /^(x) := fx {x\Ti-i) /^(x - Xi^.^i) and 5(2;) = l/x. Tlicn Xi^.^i and 
are -measurable and for all i G {h,l — h) the conditional expectation 
can be evaluated as follows. 



E 



F{Xi) ~ t 



fx {v\Fi-i) dv 



K3 ix)[f^oQ){t + hx) {gofxo Q) {t + hx) dx. 



(62) 

Use a Taylor expansion of the composite functions, p{t) := ^/,, o (t) and 

q{t) := {g o fx ° Q) (t) by using the Faa di Bruno chain rule given in (38); 
starting with the latter Taylor expansion. 



{gofxoQ){t + hx)= 

3=0 



h^X^ {gofxoQi^^ [t) fe^Ar^sAr (g p /^^ p Qf^'^^ ^ 

.7! (sAr)! 

(63) 



(sAr) 



where |t| < 1. The intermediate derivatives for j=0,l,...,sAr are given by 



{gofxo Qf'^ (t) = Yl (-1)'^! ((/^ ° Q)(t))"*'+'^ n 

due to restrictions imposed in Assumption (B). Similarly, 



ik+Dij ( ifxoQf^ 



0{1) 



(fvoQ){t+hx)^ Y 



sAr~l hJ 

E - 

3=0 



[fr, o Qj (t) /i^A'^x-^A'- i^f„ O Q j {t + 5hx) 



(sAr) 



(sAr)! 



(64) 
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where \S\ < 1. Therefore, using (64) and (63) in (62) with the vanishing moment 
condition (10) imphes that, 



E 



K'- 



(FVQ 



- 1 



I j=o e=o "'-1 



j+i>sAr 
sAr— 1 /'t w . \ 1 -i ,.1 



E 

sAr— 1 

E 

(MO!? 



(s A r)!j! 



(s Ar)!£! 
1 



2;2(sAr)^^^^)^(sAr-)(^ _^ T/ia;)^^'"''^) (t + (5^) dx 



However, by Assumption (B), /,^"'^ and Q are Lipschitz continuous for j = 
0, . . . , s and therefore bounded. Consequently p^-^^ and g^-*^ are also bounded 
which means that uniformly in i. 



E K, 



Define, K3{Xi,i^i,t) 



( F{X,)-t 



< Ch 



sAr+1 



a.s. 



(65) 



and5(Xi^j_i,t) -.^K-i (X,.^_i,t)- 



EiiTs (Xi_i_i, t), then E5(Xi^j;_i, i) = and by Jensen's Inequality '¥.K-j,(Xi^i-\,t)^ < 
oo. It will be shown by an application of Theorem 1 of Wu (2007) that J^^^i 
Cp (/i*^''+^n^^ 2"i(n)). Then, define the physical dependence measure, i?i = 
suPte(,,,i_,,) !|E [g{X,^^-i,t)\To] [g{X^,^-ut)\T-l]\\. To bound t?,, let 77^, be 
an i.i.d. copy of 770 and define X*f_i = Xi^i-i — arjo + c^t^q with the associated 
sigma field T* = a (?/;, ?7i„i, . . . , ryi, ryg, ryi, . . . ; ei, . . . , £;). Then by Theorem 1 
of Wu (2005) it was shown that di < supjg(^_^_^) ||g(Xi_i_i, t) — g{X*^_-j^,t)\\. 
Using this, (65) and the Lipschitz property of it will be shown that 7?i < 



Ch 



sAr+2 • 



sup ||5(X,,,_i,t)-.9(X*,_i,t)| 

t£(h,l-h) 



= sup 



(^K3{Xu-l,t)+K3 (X*,_i,i)) (i^3(X,,,_i,t) -if3 (^M^1,0) 
F3(X,,,_i,t)-F3(X*,_i,t) 



sup 

t€{h,l-h) 

sup 

t£{h,l-h) 



K: 



( F{u)-t 



< C/i"^''+i sup / 

te{hj-h) Js 

<Ch''''^+^\\m-v'o\\c^ = Ch 



h 

F{u) - t 



) {fr,{u-X.,,,^,)-f,{u-X*,_,)) 



du 



du \\X. 



X* 
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where the last line follows due to the Lipschitz property of Q and the bounded 
domain of K^. Then by Theorem 1 of Wu (2007) and Karamata's Theorem, 

\\J2Ll9i^i,^-l^t)f = (/l2(''M+4„2-a^2(^))^ ysiug this and (65), 



1 " 
nh ^ 



-.71 1 

— V5(X,,,_i,t) + — VEi^|(X,,,_i,t) 

nh ^ — ^ rj h ^ — ^ 



J / , .J \ I'll' — J^l / ' 7 

nh ^ — ' 



= Op{l) (66) 

Then the first term on the RHS of (60) can be bounded by (66) and a similar 
application of Lemma 1 of Zhao and Wu (2008), 



■E 



( F{X,)-t 



nh ^ \ 



i=i 



E 



Ki. 



( F{Xi)-t 



E 



K3 



■Fi-i 
F{X,) - t 



F{Xi) - t 



= fj.%{t) J K^{x)dx + Op{n-'^L{n))+0{h'^) 
Substituting (67) and (61) into (60) implies that, 

n 



(67) 



For the Lindeberg condition, let e > and define A„ = {|Ai(i)| > e}, then 
similar to the procedure used in the Proof of Theorem 2, it can be shown that 
An ^ ^ and the Lindeberg condition holds. Thus by the martingale CLT, (59) 
holds and by using (Bl) in the decomposition given in (58) the result follows 
by Slutsky's Theorem. □ 

Proof of Theorem 4- Again, use the decomposition (58) used in the Proof of 
Theorem 3. Then, define the standardised process. 



T,(i) 



E[7,(t)|J-,_i]-E7,(t) 



/i%i-ti(n)-Hi(i) 

It will be shown via use of a Hermite expansion of the LRD variables that. 



^T,(0 Aaa(o,Ci2; 



(68) 
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To do this, split the LRD variable Xi into two parts, Xi = r/i + Xi_i^i. De- 
fine the standardised version of Xis~i, Zi = s^^ {Xi^i-i — fix), Zi ^ A/'(0, 1). 
Notice that Ti{t) and Zi are both J^i_i-nieasurable and define G{Zi,t) := 
E [7i(t)| — Kji{t). Then clearly, KG{Zi,t) = and by Jensen's inequal- 

ity, EG(Z,;,<)^ < oo. So by Taqqu (1975), G{Zi,t) can be re-expressed by its 
Hermite expansion, 

oo 

G(Z,;,i)= V ^H„,{Z,) 



r/i— 1 



where = E \R.,n{Z\)G(Z\^t)\ is the m*'' Hermite coefficient. For our case it 
is assumed that a\ ^ 0. Evaluating ai. 



ai = E[ZiG(Zi,i)] = E 



Zi— [ fi{u + nx + sxZi)K^ 

C77 JR 



^{u + sxZi) -t 



du 



^ f f / -.T^ + sxz) -t\ ^, ,J u , , , 
— / / zfi{u + fix + sxzjKs (p{z)(p — dzdu 



fipit + hw)K'i {w) (p ) (/) — 1 dw 



u 



SxCFti MJ-j. (t>{^ ^{t + hw)) \ sx J \(Jr, 

By exploiting the Faa di Bruno formula further, it can be shown via Taylor 
expansions that the asymptotic behaviour of ai satisfies, 

ar - 3 'l:fL,, I <i> f^:^^) ii) - u) ^ (^) du = h^HAt) 
From Corollary 5.1 of Taqqu (1975), 

n n 

^-^ 2 Lin) ^-^ 

Therefore (68) holds by Slutsky's Theorem in the decomposition given in (58) 
in conjuction with (59), (68) and (B2). □ 

Proof of Theorem 5. First, fix fc G N and choose distinct integers < ji, j2, ■ ■ ■ , jk < 
m„. We adapt the proof of Theorem 5 of Zhao and Wu (2008) to our case. 
The proof of their result was reliant on another result given by Theorem 1 
of Grama and Haeusler (2006) which requires a martingale difference sequence 
that has third order moments. We obtain such a sequence below. Define, 

n 

S';^^(t), .4„| 

is a martingale since E/fa ( ) — for all t G {h,l — h). Let Q be the 
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quadratic characteristic matrix of .S'^ ^, that is, 

n 



<k 



Qrr' — 



i=l 

e(k3 



nh ^/v{tj^)v{tj^,) 



X] ('^^'^^ + {^^F^^Or) + l^F{tj^,)) ere, + f^F{tj,)fJ.F{tj^,)) . 



i=l 



However, by construction, if r 7^ r' , then \tj^ — tj,\>2h and the kernel function 

ifg: [-1,1] — which imphes that {x G R : {h'^ \F{x) -tjj<l}r] {h'^ \F{x) - tj^, | < l}} 

0. Therefore when r 7^ r' , Qrr' = 0. If r = r', then by (11), 



\Qr 



Qrr — 



111 < 



1 



1 



E 



2|Mf(ijJI 



E 



ere. 



O (n-7i(n)) 



Let ( 

^rr')i<rr'<A; thc k X k identity matrix. Then by the above argument 
E\Qrr'-Urr'\i = O (^n-^Li{n)^ uniformly over 1 < r,r' < fc. Also, X;r=i IE|*j(OI^ = 
0{n-^h-^). Combining the two yields, T.l=l^^^{^)? +IE|Srr' - Urr'\i = 
0{{nh)~^+n^^ L'^/'^{n)). Considering the asymptotic behaviour of (23), (1 + i?m„ {x))'^ cxp | — 

O [h-^ |log h\^-^ and using (22) it follows that (1 + B™„ {x)f exp | ^\ ^""^ | A„ ^ 
0. Therefore thc same framework and argument applies that was used in the 
proof of Theorem 5 of Zhao and Wu (2008) and the result follows. □ 

Proof of Theorem 6. The proof of the Theorem uses a similar result to Theorem 
5 of Zhao and Wu (2008). However, to be able to adapt the result of Theorem 5 
to this case and ensure that S^{t) can be modified into a martingale we add 
and subtract the conditional expectation by defining. 



Sfit) :=^S*(t)=^(s,(t)- 



v{t)\/ nh 



\ h 



With this definition, {Sf [t),T,^} is a martingale and 



E^ 



v(t)Vnh ^ 



(69) 



The proof of the result will follow from Slutsky's Theorem if the first term on 
thc RHS of (69) follows the extreme value distribution and the last term on the 
RHS of (69) converges to zero in probability. From (65) and (25) it follows that 



{n^h 



= Oa.s.(l). Now 
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turn attention to first term on the LHS of (69) and apply a similar proce- 
dure to the one used in the proof of Theorem 5. Fix fc e N and choose distinct 
integers < ji , j2 , . . . , jfe < Wn and define, 



Then < f.{t),J^n \ is a martingale. Let Q be the quadratic characteristic 
matrix of 5*,^ j., that is, 

■n 
n 



I y ] 



A'- 



J", 



E 



■Fi-l 



By a similar domain argument that was presented in the proof of Theorem 5, if 
r ^ r\ Qrr' = 0. If r = r', then. 



Qr 



nhv^{t-j 



1 " 



^2 (F{X,)-t,^ 



(70) 



Therefore, using (61) and (67) in (70), 

||Qr,.-l||3 < \\Qrr-l\\=0{5) (71) 

where 6 = n^^ L{n) + h? . Define (urr')i<r r'<k ^'^ /c x A: identity matrix , 

then by (71), uniformly over r,E I Q„., = 0((5i). Also, ^"^^ E |S*(ijJ|^ : 

O (n^ih-i^ which implies J27=l^\-^it)\^ + E\Qrr' - Urr'\^ = 0(A„) where 

A„ — n^'^h^i + n^T^f (rj) -j- /i^. Similarly, due to the bandwidth restriction 

in (22), (1 + Bm„{x))'^ exp |— ^^^y^} ^ *-* ^-'^'^ ^^'-^ same argument in the 
proof of Theorem 5 the result follows. □ 
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